Skip to content

Cleanup public interface #589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Cleanup public interface #589

wants to merge 19 commits into from

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented May 23, 2025

This PR does the following:

  1. Implements parse_tsv() and parse_csv() as aliases for parse_sssom_table that don't try and guess the right separator, which makes it easier to understand exactly what the function is for
  2. Guts and rewrites private separator inference and file opening utilities
  3. Exposes functions for parsing SSSOM at the top level, i.e. so you can do sssom.parse_tsv() instead of needing to know it's hiding in sssom.writers.parse_tsv
  4. Get rid of the sssom.parse function which just wraps pd.read_csv(), and is only used by a test
  5. Quality of life improvement: enables all write functions to take in a file path (in addition to being able to taking a TextIO)
  6. Update README to show off high level I/O functionality in a "getting started" section
  7. Improve docs and typing

Demo

import sssom

# other SSSOM files can be found on https://mapping-commons.github.io
url = "https://raw.githubusercontent.com/mapping-commons/mh_mapping_initiative/master/mappings/mp_hp_eye_impc.sssom.tsv"

# TSV can be parsed into a mapping set dataframe object,
# which includes a pandas DataFrame, a curies.Converter,
# and metadata
msdf = sssom.parse_tsv(url)

# SSSOM comes with several "write" functions
sssom.write_tsv(msdf, "test.tsv")
sssom.write_json(msdf, "test.json")
sssom.write_owl(msdf, "test.owl")
sssom.write_rdf(msdf, "test.ttl")

@cthoyt cthoyt marked this pull request as ready for review May 23, 2025 09:20
@cthoyt cthoyt requested a review from matentzn May 23, 2025 09:36
@cthoyt
Copy link
Member Author

cthoyt commented May 23, 2025

@matentzn @joeflack4 I'd appreciate a review!

@cthoyt cthoyt requested a review from joeflack4 May 23, 2025 09:36
Copy link
Collaborator

@joeflack4 joeflack4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cthoyt Thanks for adding these aliases! Other misc changes and docs look good, too.

Just one more thing on my wishlist: method/func names and locations

joeflack4
joeflack4 previously approved these changes May 24, 2025
@cthoyt cthoyt enabled auto-merge (squash) May 24, 2025 19:36
msdf = parse_sssom_table(stream)
input_path = test_data_dir.joinpath("basic.tsv")
with input_path.open() as file:
msdf = parse_sssom_table(file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intention of simplifying the code here (no need to read the entire contents of basic.tsv into a string, turn the string into a file-like StringIO object, and then pass that object to parse_sssom_table, when that function can take a pathname directly), but in this instance I believe this is ill-advised.

Judging from the name of this test (test_parse_sssom_dataframe_from_stringio), it is specifically intended to check that you can use parse_sssom_table on a StringIO object, rather than a file descriptor or a pathname. Now it is no longer testing that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, so I updated this in aabdb93.

Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like what I see. @cthoyt can you address @gouttegd comment on the test, I will deal with merging and publishing this! THANK YOU!

@cthoyt
Copy link
Member Author

cthoyt commented Jun 15, 2025

@matentzn done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants